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FOREWORD 


This report presents the results of a study of the specifications 
for an information system intended to support the design, production 
and maintenance of large computer programming systems. Called 
Evolutionary System for Data Processing, or ESDP, it was begun as an 
internal IBM project in 1965 by the Center for Exploratory Studies 
of the Federal Systems Division and continued under Air Force 
sponsorship during 1967 and early 1968. 

This work has been performed under contract number F1962S-67- 
C0254 for the Electronic Systems Division, U.S. Air Force Systems 
Command. The project monitor was Mr. John Goodenough, ESLFE. 

The authors wish to express their appreciation For the encourage¬ 
ment and assistance provided by Dr. John Egan, formerly of ESD, and 
their colleagues Dr. Harlan D. Mills and Mr. Michael Dyer. 

This report is in four volumes: Volume 1, System Description; 

Volume 2, Control and Use of the System; Volume 3, The CAINT Executive 
Language and Instruction Generator; and Volume 4, Programming Specifica¬ 
tions. This report was submitted on January 31, 1968. 

This report has been reviewed and is approved. 




WILLIAM F. HEISLER, Col, USAF 
Chief, Command Systems Division 


Project Officer 
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ABSTRACT 


E C DP is a proposed system whose purpose is to acquire, 
store, retrieve, nuhlish and disseminate all documentation, 
exclusive of graphics, concerned with a large computer 

proaramminn activity. Documentation is deemed to consist, not 
only of ^inal or formally published after-the-fact reports, but 
of workina files, desian and change notices, informal drafts, 
management reports--in fact, the entire recordable rationale 
underlying a programming svstem. Maximum attention has been 

concentrated on the means of acquiring and organizing 

documentation. Two major, complementary approaches are proposed, 
the first is called Program Analysis and is a process of 
documentation directly from completed programs. The 
second is called Computer Assisted Interrogation and is a process 
of eliciting information directly ^rom the programmer, through 
on-line communication terminals. The former provides canonical 
data about the nroqram's structure. The latter provides 
explanatory material about all aspects o^ the program, and in the 
absence o r canonical data, may provide tentative structural 
information is well. The conclusion o r f hn study group is that 
F.SDn is a feasible concept with prosent-dav technology and that 
it will materially benefit using organizations in the production 
of programs and in guiding their evolution as requirements 
change. Its value will he greater for larger organizations, 
whosp internal communieations difficulties tend to cause truly 
aiianfic inefficiencies. Tts implementation as a support system 
for such projects voul 1 require a siinificant quantum of 
investment in order to produce these benefits and is predicated 
on the iiso of a computer system delicate 1 solely to the use of 
ESDP. 
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SYSTEM DESCRIPTION 


1. Genera 1 A ppro a ch to P ro qra amin q. The general architecture 
proposed for the ESDP system is that used for Operating 
System/360-9ueued Telecommunication Access Method (OS-QTAM) (see 
Figure 1). In such a system, terminals communicate with the 
central processing unit via telephone lines and a multiplexor 
channel. In the central processing unit, two or more programs 
are operating asynchronously in separate partitions of high speed 
memory under control of the OS supervisor. 

In one partition, the Message Control Program plus some 
additional QTAM code dispatches incoming and outgoing messages. 
The Message Control Program makes use of core buffers (the number 
and size being specified by the programmer) plus message queue 
storage on a direct access storage device. 

In the other partitions are the Message Processing 
Proqrams. These programs perform all the ESDP processing 
functions. They receive messages from and transmit messages to 
the Message Control Program via GET and PUT macro commands. When 
a message has been received, an ESDP controller, one of the 
Message Processing Programs, must first determine what activity 
the sender is involved in. For instance, it must recognize 
whether a message is a response to a question in interrogation 
or, say, a query. Once this determination has been made, some 
type of housekeeping, depending on the particular message and 
activity, is performed to initialize the ESDP functional 
routines. Program control is then switched to the particular 
module of programming required to perform the desired activity. 
These modules interact with the system files and issue messajes 
back to the terminals via PUT commands. 

In addition to providing the capabilities outlined 
above, the operating system for ESDP must be concerned with the 
following requirements: (1) More than one user terminal may be 
communicating with any one program module at a time. This 
requirement may best be met by assuring that the program modules 
are reentrant. (Note that in our current experimental work wo 
have used PL/T which produces reentrant code.) (2) Different 
user terminals may be communicating with different prograin 
modules at the same time. We feel that this requirement can 
probably be met by a multi-tasking supervisor such as that now 
used in OS/360 with Multiprogramming with a Variable Number of 
Tasks (MVT). This will provide for a primitive form of time 
sharing by activating tasks whenever an I/O ooeration occurs, 
makinq use of a priority system for the tasks. For the system 
described, this type of time sharing should suffice, since there 
should be no periods of long processing, uninterrupted by I/O 
commands. (3) More than one user terminal may be accessing any 
one data element at the same time. This will require that some 
form of data base lockout be placed into the system. 
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Figure 1. General ESDP System Concept 
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The general concept of ESDP is for a teleprocessed 
terminal-oriented system. The terminals themselves should 
comprise the following: 

a. A cathode ray tube display with keyboard entry. 

>1ost of the conversational processes will be performed 
through this device. Light pen capability and/or vector drawing 
capability may be desired depending on the need for activities 
such as production of graphics in the documentation. For the 
strictly conversational documentation activities, generation of 
an average-sized character set (e.g., 64 character set including 
numbers and upper case letters) on the face of the CRT should 
suf fice. 

b. Hard copy printer. 

It is often desirable to retain a hard copy of that 
has been displayed on the CRT. This can be accomplished 
typewriter type printer (without keyboard), the printing 
activated by command from the keyboard associated with the 

c. Line printer 

Lino printing should be centralized so that high volume 
outputs can be generated in the machine room for subsequent 
manual transmittal to the requesting user. 

d. Terminal polling 

A round-robin polling system with priorities such as 
that used by OTAM seems appropriate. Of course, if inefficiency 
results, perhaps the priority scheme should be revised so as to 
be based on the particular activity, for instance, rather than 
simply the terminal identification. 

It is anticipated that during hours when normal ESDP 
documentation activity is light, other programs can be run that 
are not under the general QTAM-ESDP set up. Examples of such 
programs are: 

File cleanup p rogr ams—It may be necessary to move data 
on the direct access storage devices in order to reuse space 
freed via deletion of records. This reorganizing is one tyne 
file processing that might be performed oft-line. In addition, 
there are normal utility functions such as disk copying, disk 
printing, etc., that could fit in this category. 

£l§z£L 22 ^ssors—There may be some pre-processinq 
desired for the CAINT Executive Language. This is particularly 
true when debugging macros'are to be used. Such pre-processors 
could operate off-line. 


which 
via a 
being 

CRT . 
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2. Ha rdware Assumptions. Hardwar 
this study, except indirectly, when 
various objectives was considered, 
basic hardware assumptions underlying 


e has not been considered i 
feasibility of attainin 
There were, however, sotn 
the study. These are: 


n 

9 
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a. Machine Utilization 


A computing system will be dedicated to ESDP. 
b. Machine Type 


A System/360 computer with Operating Systera/36 
used for the experimental programming in this project, 
choice of hardware, of course, is not mandatory. However, 
of the discussion in this report is based on S/360 with 0 
therefore, uses that terminology. 


0 
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much 
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II 


DATA BASE 


We foresee the need for several files, or, in the 
terminoloqy adopted herein, file sets. While it would be 
possible to store most of the information to be described below 
in one monolithic file, this breakdown recognizes differing 
frequencies of file modification, different processes to be 
performed on data, and different means of control of access. 

1. P rog ram Description File Set. This file set contains a 
logical record for each unit of programming. Its organization 
will bear a close resemblance to the outline of a conventional 
program description, but there is no permanent standard and it is 
expected and encouraged that the content and composition of this 
file will be shaped by the users to fit their own needs. 

The major subjects to be covered, in a generalized form 
of the file are: 

o Identification—of the program, programmer, date, 

etc. 

o Program Structure--in terras both of the 

hierarchical structure of the program anl of the 
branching, or control, structure. 

o Data References—the data items named by the 

program and the nature of their use. 

o Logic Description--both symbolic and natural 

language descriptions of what the proaram do«s, 

how, why. 

o .Management and Status Data--inforraation relative 

to the program as an item being produced, its 
schedule, progress, problems, etc. 

o Illustration References--references to flow charts 
and tables to be composed by ESDP and to be 
orinted with this program description information. 
Also, references to other graphics, used for 
illustration, which are not able to be store 1 
within the ESDP computer. 

Except for the identification section, for which no amplification 
is necessary, these items are discussed below in greater detail. 

a. Program Structure 

This section would contain pointers to related UOP's. 
There are two general categories of relationship: hierarchical 

and control. Hierarchical pointers would indicate subordination 
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or superordination, and control pointers would indicate entry 
points or predecessor and successor OOP's. Entries from, or 
exits to, label variables would be treated as a special case, 
with the variable representing a program switch which might be 
given a form of UOP status. Also contained in this section would 
be a codification of the type of branch control (whether 
unconditional, such as a PL/I GO TO; or conditional, such as an 
IP or DO and the variables that affect the branch. In addition, 
there would be narrative explanations of the control logic, or 
pointers to such explanations. 

b. Data References 

The exact extent to which data documentation should be 
split or duplicated between the program and the data description 
files depends on the philosophy of management of the object 
system. At a minimum, this section of a program description file 
must list the data elements that occur in the program, and must 
give the nature of the usage, such as a control variable (a 
variable that directly affects a branching decision) , a computed 
value ( set by an assign or DO statement) or any of a number of 
other categories of usage. The bulk of the actual description of 
the data, as differentiated from the codification of the nature 
of its use, will be carried in the Data Description File Set. 

c. Logic Description 
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program does, h 
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abstraction. He 
items contained i 
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The obvious kind of information to 
do with progress in meeting schedules and 
assigned, problems being met or anticipated. 
Mostly, the information will be collected b 
some, such as number and dates of compilation 
etc., can he aeguired automatically. 


be collected has to 
milestones, manpower 
and budget data, 
y interrogation, but 
, program lengths. 
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e. Illustration References 


Wo may group illustration material into three classes: 
flow charts or tables that are required according to the system 
documentation plan, flow charts or tables volunteered by the 
programmer or file documentor to augment some asnect of his 
narrative, and other illustrative material that is volunteered, 
but is not in flow chart or table form. The restriction on flow 
chart and table form arises, of course, from the planned use of 
standard graphic programs that will produce these configurations 
easily, but cannot handle the full range of graphic input. Thus, 
if a programmer wishes to input a logarithmic graph, or a diaqram 
of two aircraft on a collision course, he will probably have to 
draw these in the conventional manner, but include in the 
machine-store! documentation, a reference to the illustration 
copy. 


Tn the program description file set will be stored only 
pointers to the detailed illustration information, whether or not 
stored within ESDP. We recommend this separation because these 
files will be large, and, while they may be updated whenever the 
program is, they will rarely be subject to information retrieval 
searches in the same way as detailed data in the remainder of the 
file. A us°r may want to retrieve the information that is 
displayed on a chart, but he will not normally want to retrieve 
the detailed FLOWCHART instructions that organize the display. 

2. Data De sc ription F il e Set. The recommended approach for 
documentation of data files is to create a data description 
record for each file or structure used in any program, with as 
many workinq records as needed to give each user a chance to 
document the data the way he prefers. Another version of the 
descriptive record will be created and maintained by an 
authorized COMPOOL, or data base, controller in whom will be 
vested authority to make final decisions on data definitions and 
attributes, and who can delete working records at will. His 
expected mode of operation, then, should be to review his data 
items periodically, look at the conflicting descriptions or 
requests of the individual programmers, make his decision on 
which version to accept or declare, and nut the final decisions 
into the permanent record. 

Provision can be made, using the dissemination 
services, to promulgate the data base controller's decisions 
immediately to all programmers concerned. 

The organization of the data description record will be 
similar to that for a program. It will have the following major 
headings: 

o Identification 

o Element description—the narrative and 

other information about the item and its 
use. 
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o Structure—primarily, this is used for 

higher level structures in order to 

define the subordinate structure. 

o Using program references—pointers, 
possibly some supporting text on what 
effect the particular using program has 
on the data. 

o Illustration references 

3. ££23£3.!!1 File Set. This file set will contain the text of the 
programs being documented. In addition, consideration will be 
given to storing, within this file set, program change 
information, separate from the program, itself. This will permit 
a record to be kept of all changes, and this will permit 
programmers to make changes either to the latest version of a 
program, any previous version, or both simultaneously. The 
complete text of any version of the program could be retrieved on 
request. Another possible class of information for this file set 
is partially reduced program analysis data. This would be 
intermediate output, produced during a program analysis run, 
which could be saved to reduce the time required to process a 
change to the program. 

4. Graphic Coding File Set. We recommend the use of a program 
for the automatic production of flow charts and tables. In some 
systems, such as FLOWCHART [1], graphics are assembled by the 
issuance of commands on how to build them, in a manner similar to 
computer programming. The Graphic Coding File Set would contain 
these instructions. 

The graphic files may be updated separately from the 
proqram or data files they illustrate, but this form of updating 
should probably be restricted to changes in layout. Chanqes in 
content or structure should be keyed to changes in the data or 
programs being described, although the initiative for a change 
may originate with either a program or data description change or 
with a graphic change. 

5. Publication File Set. This file set will contain partially 
processed documentation, taken from any of the other files. The 
preparation of copy for publication can be a time-consuming 
process. Hence, partially edited material should be retained in 
machine readable form for reprinting or for selection for 
inclusion in differently-organized documents. This form of 
storage is used by the IBM Administrative Terminal System [2 1, a 
text processing system, a successor of which is recommended for 
use in ESDP. 
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6. Inst ru ction Course File Set. Instruct 
throuqh ESDP olays a dual role. It is use 
system program, but it is also a form 
changes in it must be keyed to changes 
being taught. Hence, instruction cours 
programs, but must have appropriate poi 
the documentation from which they were der 


ional material produced 
d in its own riaht as a 
of documentation and 
in the programs or data 
es can be stored as 
nf.ers back and forth to 
ived. 


7. Dis se mination File Set. These files will contain the profile 
and distribution lists needed to operate the internal ESDP 
dissemination system on documentation and changes thereto. 


8. In dex File Set. These files are those indexes and inverted 
indexes used by the information retrieval system to carry out its 
functions. These are also dynamic files, which are subject to 
frequent change as the documentation files change. 


9. Suffer File Set. 
and are for use by the 


Buffer files are dynamically created 
information retrieval system. 


files 
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PROGRAM ANALYSIS 


1. General. 
by IBM [3] 
activities. 
Control or 
canonical or 


A Program Analysis (PA) program has been produced 
as part of its own internally sponsored ESDP 
This program accepts input in PL/I, OS/360 Job 
OS/360 Linkage Editor languages and compiles a 
structural data file descriptive of the hierarchical 
and control structure of the programs and their usaaes of data. 

Source Code Analysis is performed by a set of compiler¬ 
like analyzers which are oriented to a particular language. The 
number of analyzers is dependent on the make-up of the user's 
programming system. The role of each analyzer is identical 
regardless of the language, namely to map source code into the 
UOP coordinate structure and generate the data records associated 
with it. In this way, each analyzer, which is necessarily 
language dependent, can effect a common interface with the 
system. 


Control 

OS/360 


Three analyzers 
Language (JCL) , 
PL/I Language. 


have been written, for OS/360 JOB 
OS/360 Linkage Editor Language, and 
This sample was selected to permit 
experimentation with programming systems written primarily in 

:e again 
is pro¬ 
point since it is key to the 
automatic analysis of system-wide interactions. 


experimentation witn programming systems written primal 
PL/I, of which the analyzer, itself, is an example. He not< 
that the treatment of run-time languages (e.g., JCL) a: 
gramming languages is a critical point since it is key 


Current compilers and assemblers now generate source 
code listings and cross reference lists for data variables for a 
single program at compilation time. However, this is ordinarily 
the extent of their automatic capabilities. Additional 
programming information on program interactions within a larger 
system, rationale behind program logic and program groupings, 
data flow through the system, and so on, are necessarily based on 
interrogation-acquired documentation. 

The analyzer parses the source program into elements 
called Units of Programming (UOP). The current program produces 
UOP's at the following levels: 

JOB 


LOAD MODULE 

SOURCE MODULE (Compilation Unit) 
CALL MODULE - Procedure Block 
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ROUP - 

BEGIN block 



DO 

group 



IF 

compound 

statement 

EGMENT 

ON 

compound 

statement 


For each UOP in a structure, a 
which contains the appropriate struct 
information. This structure and logic d 
records is also the mechanism for creati 
a program system or any of its major com 


data record is created 
ure, logic and data usage 
ata of the individual UOP 
ng the total structure of 
ponents. 


To make the programmer aware of how 
structured, a revised program listing is 
graphically depicts the coordinate structure a 
this program. This revised listing is usef 
guide to the files, but also as a picture of p 
which may easily become obscure in the co 
listing, particularly with free format languages 
statements can be strung together in a single pr 
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System-wide interactions of a program can be obtained 
through the automatic analysis of the Object Module generated by 
the compiler and the JCL deck that would be written for 
execution. 


0 

program in 
system. Th 
from the 
involved in 
references 
references. 


bject module 
teraction (e 
is process co 
external syrab 
a given link 
an 1, discrirai 


analysis yields 
xternal procedur 
nsists, basically 
ol dictionaries o 
age editor run, d 
nating between da 


information 
e in PL/T 
, of extracti 
f all the obj 
etecting mod 
t.a references 



reqa 

r 

d 

in 

g 

te 

rm 

S) 


i 

n 

a 

nq 


sy 

m 

b 

ol 

s 

ec 

t 

mod 

u 

le 

s 

ul 

e 

c 

r 

o 

ss 

- 

a 

nd 

b 

ra 

nc 

h 


Thij information is critical when sots of programs are 
linked together and manipulate the same data, since this is the 
source of most problems and delays during integration of 
programming systems where the various pieces were written and 
debugged by different people. 

Within OS/360, the execution of a program would require 
a JOB Control deck or program. The analysis would equate, within 
the UOP records, the file declarations at the JOB level with all 
references to these files down to the SEGMENT level. In a more 
complex case, where condition codes and multiple job steps were 
defined, this same correlation of program units and data usag^ 
would have adled significance. 

2. Operation of Program Analysis. The analysis of PL/I code is 
performed in several phases. In Phase 1, PL/I source code is 
read in, and blanks, comments, and constants are eliminated. The 
remaining characters are translated through use of a translation 
table. The general effect of this translation is to replace the 
source language string with numeric codes in such a way that 
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alphanumeric strings are grouped 
special characters in the lower 
in the middle. This is done so 
limit testing can be used on the 


in the higher number codes, 
number codes, and operation codes 
that in future processing, simple 
codes to determine the type. 


An output 
is given a statement 


string 

number. 


is organized in 


which each 


sta tement 


The statements are scanned 
names, parameters, or condition 
dictionary entry is created, and a 
entry is stored with the statement. 


and whenever labels, file 
names are encountered, a 
pointer to the dictionary 


When OOP defining statements (e.g., DO, 
encountered in the scan, entries are made in a parsi 
Then, when statements are encountered that end UOP's (e 
the table is searched to determine which entry is cl 
table then contains the statement numbers defining the 
the OOP's. 


BEGIN) are 
ng table. 
• g. , END) , 
osed. The 
limits of 


At the completion of Phase 1, the parsing table has 
been filled, and the dictionary has been partially filled. 

Phase 2 reformats the source text, indicating the 
parsed units and statement numbers. The units are indicated in 
such a way as to ease reading. 

In Phase 3, DECLARE statements are analyzed. This is 
done using an array of attribute masks. Each data attribute is 
represented by a 32-bit mask (row). Each element, A(i,j) 
represents the interaction of attributes i and j. If A(i,j) is 
one, then the two attributes can co-occur. Zero means that they 
cannot co-occur. For instance, EXTERNAL can co-occur with FIXED 
but not with INTERNAL. Each attribute is looked up in the table 
of masks and all of the masks are AND'ed together. The result, is 
a 32-bit string with ones representing the attributes of the 
DECLARE'd data. Note that by starting with the assumption that 
all attributes apply and then ruling out impossibilities, 
defaulted attributes are also depicted. Scope tables are 
generated for the data and these plus the attribute masks are 
added to the iictionary, which is now completed. 

At this point in the Program Analysis process, two 
internal tables have been built--the dictionary and the parsing 
table. Their formats are described below. 


a. Dictionary 
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(3) Bytes 7-8 contain for structures, pointers to 

structure elements, and for labels, the statement number of the 

label declaration. 

(4) Bytes 9-28 contain the identifier as it 
appears in the source code. 

(5) Bytes 29-32 contain a bit tahle that defines 
the unique attributes or characteristics of the entry. 

(6) Bytes 33-40 are a set of offset values that 

point to the overflow area. Note that certain PL/I attributes 

carry value information, e.g., precision, bounds of dimensions, 

file environments, etc. This value data is stored in the 
overflow area and the offsets are used to delimit the start and 
stop of various values. 

(7) Bytes 40-119 contain any values associated 
with attributes. 
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Hash Scope Structure Data Name Attri- Table Over- 
Chain Pointers butes of flow 


Byte 1 3 

Offsets Area 

79 29 33 40 


Figure 2. A Typical Dictionary Entry. 


Switch 

Including First Last 

Level Procedure Statement Statement Dictionary 

Number Name Number Number Pointer 

Byte 1 

2 10 74 86 98 


Figure 3. A Typical Parsing Table Entry. 
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The string generated in Phase 1 is read in Phase 4, and 
a new string is produced that is completely coded. \11 
identifiers are replaced with dictionary pointers. 

Phase 5 determines data type for all data in the 
dictionary and adds a data type code to the dictionary. 

Phase 6 reads in the parsing table and reads in the 
program statements, one at a time. From these it generates the 
IJOP records and with the additional input of the dictionary it 
generates the trailer records. These are written out on tape. 

The JCL cards are used by the JCLSCAN Program, and each 
card is examined to determine if it is a JOB card, an EXEC card, 
a DD card, or other. Cards in the other category are immediately 
rejected. JOB cards are further examined for condition codes at 
the JOB level. If they exist, they are stored on an analysis 
list. 

For EXEC cards, the program stores the job step name in 
the analysis list and then determines if the name refers to a 
catalogued procedure. If it does, the name is marked as job 
level. If it does not, the name is marked as load module level. 
The EXEC card is then checked for JOB stop parameters, and if 
there are any, they are stored in the analysis list. The sine 
process is followed for JOB STEP condition codes. 

For DD cards, the DSNAME is stored in the analysis list 
along with any disposition parameters. 

After all of the JCL cards have been read, the analysis 
list is further processed, the process varying with the type of 
JCL statement. 

JOB - The HOP name is extracted from the job statement 
label field. The entry and exit, portions of the UOP are marked, 
and if condition codes exist, subordinate UOP are marked as exit 
points. 

DD - A data reference entry is made in the UOP for the 

DD. 


EXEC - JOB STEPS become subordinate units to the 70B 
UOP. The UOP names are the JOB STEP names. If J03 STEP 

parameters exist, a data reference entry is made using a dummy 
name. If JOB STEP condition codes exist, the subordinate 

transfer table is marked accordingly. 


The Linkage Editor Analysis Program (LEAP) begins by 
reading from the primary input stream. A test is made to 
determine if the first entry in the stream is a linkaae editor 
command. If it is not, the entry is processed as an object 
module. If it is, another test is made to determine if the 
command is an INCLUDE statement. INCLUDE statements effect 
readings from secondary input streams. All other command types 
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are igno 
stream un 
secondary 
the card 
9-2 punch 
a load 
discussed 
subordina 


red. Object module processino continues in the primary 
til an INCLUDE is found. Then, processing shifts to the 
stream. In the secondary stream, the first column of 
image is checked for a blank (indicating command), a 12- 
(iniicating an object module), or any other (indicating 
module). The first two are processed as previously 

and the third (load module) causes a load module 
te unit entry to be established. 


Once all of the linkage editor object modules, load 
modules and commands have been processed, a UOP record is formed. 
This UOP is in the same format as a PL/I UOP. 


3. Additional R equi re ment s. There must be added to the program 
analysis implementation an incremental analysis capability. When 
a programmer makes a change to an existing program, he should not 
have to run the entire program through analysis again. This 
process now takes an amount of time on the same order as a full 
compilation, hence in a large system it could become a 
significant drain on computer capacity if repeated often. 
Instead, the approach recommended is to have ESDP store the 
latest copy, and let the programmer make changes by use of ADD, 
CHANGE, and DELETE commands, treating his stored proqram as a 
file. In this way, PA need only analyze the changes and make 
minimal modification to the canonical data file, and new 
interrogations can be initiated only on those portions of the 
program affected. 


M 

present PA 


d at 

a la 

bels 

a 

var 

iety 

int 

erro 

gati 

deb 

uggi 

ng. 

we 

feel 

tha 

pos 

sibl 

e. 

thr 

ough 

the 

rat 

her 

tha 

hie 

rarchica 

app 

ea ra 

nee 

ite 

m is 

cha 

bei 

ng 

assi 

whe 

ther 

it 

ass 

ignm 

ent 

ind 

ex 

to 

app 

roximati 

4, 

which al 


ore detailed information than that produced by the 
program is needed for classifying the manner in which 
are used or referred to within the program text. For 
of reasons (e.g., making up more detailed 
ans, assisting in test planning, assisting in 

and providing better cross indexing of documentation), 
t data usage should be classified in as much detail as 
Furthermore, the information desired is available 
program analysis function, but currently is discarded 
n saved (this is also true of compilation). A 
1 classification code should be used for each 

of a data label. This code should reflect whether the 
nged by this usage or not; whether it is changed by 
gned a new value or having a new value read in: 
is used without being changed; whether when used in an 


statement, it is used as an it°m in 
another item, a control item, 
on to such a classification system is 
so appeared in Volume 1. 


itself, 
etc. t 
given in 


or an 
first 

Figure 
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1 


Context of Appearance 


1.1 Assignment Statement 

1.1.1 Computed Value 

1.1.2 Argument 

1.2 Control Statement 

1.2.1 Variable I/O Command 

1.2.2 Branching or Transfer Command 

1.2.2.1 Argument or condition statement (IF, 

ON_) 

1.2.2.2 Iterative Control Variable (DO) 

1.2. 2. 2.1 Initial index value 

1.2.2.2.2 Increment 

1.2.2.2.3 Maximum value or limit 

1.2.2.3 Variable address 

1.3 Subroutine/Function/Macro Calling Sequence 

1.3.1 Transmitted to SP/Function/Macro 

1.3.2 Received from SR/Function/Macro 

1.4 Data Declaration Statement (or other non-executable 
sta temen t) 

1.5 Input/Output 

1.5.1 Input 

1.5.1.1 Tnput Control Variable 

1.5.1.2 Data Element real in 

1.5.2 Output 

1.5.2.1 Output Control Variable 

1.5.2.2 Data Element written out or transmitted 

2. Change Status 

2.1 Value Changed by Containing Statement 

2.1.1 Value Directly Assigned bv Assignment Statement 

2.1.2 Value Directly Changed by DO Statement 

2.1.3 Value Directly Changed by Variable I/O Statement. 

2.2 Value not Changed by Containing Statement 

3. Structural Pole 

3.1 Data Element is a Structure or Array 

3.2 Index or Subscript 

3.2.1 VALUE OF AN Index 

3.2.2 Element of an Index Term 

3.3 Scalar Item 


Figure 4. Classification of Data Usage by a Program. 
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Another aspect of program analysis (or possibly of 
information retrieval) to be borne in mind is that, as the 
documentation files grow large, there will be inevitable errors, 
such as programmers misnaming programs, submitting the wrong 
version of a program for analysis, entering changes incorrectly 
(resulting in an actual program that differs from what the author 
thinks it is), etc. These are normal mistakes of any programming 
project and, in a purely manual system they can be tolerated and 
relatively easily reversed. The documentation file system and 
the program analysis system must be so designed as to anticipate 
such errors and, while it is not ESDP's responsibility to detect 
them, it should be possible within ESDP to correct them with 
minimum difficulty. 
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V 


INFORMATION RETRIEVAL 


1. ESDP T he files handled in the ESDP system include the 

followinq: 

a. Program Description File Set 

This file set contains a record for each UOP in th 
object system. The information in the tile may be derive 
through program analysis, interrogation, or both with the source 
being identified. 


b. Data Description File Set 

This file contains a record for each Unit of Data 
(UOD) . Here, the information is obtained through interrogation 
only. UOP and UOD are linked via pointers since the data are 
referenced in UOP. 


c. Index File Set 
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. Puffer 

File 

S^t 



words indicating for each the UOP 
he key word. The key words are 
ime that a response to a question 
n. At that time, the UOP name of 
the interrogation and the IFN 
aopended to all key words in the 


Provision is made in the ESDP concept for general file 
handling capabilities. Programs that interpret file format 
tables for file accessing will be included. In addition, file 
building may be done on line as well as off line. The intended 
use of the special files is as personalized subsets of the ESDP 
data base. It is anticipated that this feature would be heavily 
used by system managers to create, update, and search 
personalized management information systems. 


2. £iie Building and Maintenance. Creation 
UOP records and UOD records are planned 
changes in the object system of programs, 
whenever the system becomes aware of a 
information may be acquired in any one of a 


and modification of 
in ESDP to match the 
Records are created 
new UOP or UOD. This 
number of ways: 


a. Source Code Parsing 


Program Analysis creates UOP's by parsing source 
language code. UOP's created are named either by program label 
or by a combination of containing UOP name and statement numbers. 
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b. Source Code References 

References may be made in the source code to UOD or 
other UOP not subject to Program Analysis. The appearance of 
these references in source coding will cause the creation of the 
appropriate records and names will be taken from the source cole. 

c. Interrogation 

OOP or UOD may also be named by the programmer at a 
console in the interrogation process. This can occur during 
design interrogation or during interrogations performed after th^ 
object system program has been subjected to program analysis. 
Naming of these UOP's and IJOD's is a simple process since the 
programmer assigns these. 

Whenever changes to source code are submitted for 
analysis or incremental interrogations are processed, changes to 
data items in existinq UOP or UOD records are likely to tak° 
place. The changes can take the form of ADD, DELETE, or REPLACE 
(DELETE and ADD). The way in which the system handles the file 
updatinq will depend on the data elements to be changed an 1 the 
manner in which the requested changes are entered into the 
system. 


ADD'ed data items derive! from interrogation may be 
handled directly since a full record is created for each U0 D or 
UOD whether or not all of the data item fields contain 
information. Therefore, an ADD amounts to storing the new 
information into an appropriate position in the core-resi lent 
image of the UOP/UOD record, and rewriting the record to the disk 
file. Cross references are added in the normal manner. DELETE’S 
and REPLACE'S present a more difficult updating problem however. 
Again, data may be deleted in the same manner as it was adled 
above. In this case, however, cross references must also be 
updated. For instance, assume that a programmer wishes to delete 
textual information associated with a given IEN. The text may be 
deleted, but keyword references to the text must also be. This 
will be done by performing a second keyword extraction on the 
text to be deleted. The keywords extracted will then be used as 
search arguments for the keyword index records so that the 
appropriate IEN pointers may be deleted. 

The qeneral concept for ESDP file updating as a result 
of changes to source programs is to rerun Program Analysis on the 
UOP containing the changed UOP. Old UOP records will not be 
erased at this time. Through the reconciliation process, 
information associated with the old UOP records will be linked to 
the new UOP records. When the reconciliation has bnpn completed, 
the old records will be deleted. Keyword references must be 
updated during the reconciliation process. If text from an TEN 
of the old version of a record is to be moved to another TEN of 
the new version, the keyword updating amounts to changing the IEN 
pointers in all of the appropriate keyword records. If new text, 
is typed in, keywords are extracted in the normal fashion. For 
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all deleted records, the keyword deletion is performed as in the 
case of deleted text through incremental interrogation. 


3. K eyw ord File. An experimental keyword extraction program is 
now being tested. This program operates as follows: 


key word 


(1) 


extraction 


Responses to questions are subjected 
under the control of the CEL Program. 


to 


(2) Responses are edited to eliminate deleted 
lines, to eliminate deleted characters (backspace and retype), to 
eliminate carriage returns, and to convert all letters to upper 
case. This is done to eliminate mismatches in the keyword list. 
For instance, "Computer" without such editing would not match 
with "computer" and similarly carriage return characters, 
backspace characters, or delete characters will eliminate any 
possibility of an exact match. 


(3) Each word in the response is compared with 
words in a common word list. Common words are not stored as 
keywords. 


(4) Each keyword (i.e., not common word) is 
stored and is tagged with the IEN associated with the question to 
which this is a response. If the keyword is already recorded, 
the IEN is added to a list of IEN's in which the word appeared. 
In addition to the IEN, the keyword could be tagged with it 
position within the response. This would enable subsequen 
retrieval based on position of keywords in a response. 

4. Searching. Information in ESDP is indexed in four ways: 

a. Program Element 

One index to a piece of data is the particular element 
(UOP, UOD, etc.) with which it is associated. This information 
is obtained through interrogation for design documentation and 
through program analysis for final documentation. 

b. Keywords 

Another index is the keyword index. The keywords are 
extracted automatically from responses to interrogation 
questions. 

c. Data Names and Labels 

These are character strings used in the orogram or the 
program design being documented. They, too, serve as indexes to 
the UOP or UOD records. 

d. Hierarchic Code 

ESDP employs a hierarchical coding system and attaches 
a code number to each element of data. This number is called an 
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Information Element Number (IEN). The structure of these 
numerical codes is intended to classify any data collected by 
ESDP about an object system of programs. 

Searching of the ESDP files is requested from terminals 
or ESDP programs. The query languaqe is basically the same 
subset of PL/I as is used for executive programming. Again, the 
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of ESDP information retrieval is 
the disposition for retrieved 
dynamically create files from 
may perform cyclic searches by 
a buffer file, and then using the 
teria for another search. 


Cyclic retrieval is defined as the use of information 
retrieved from one query as part of the statement of a subsequent 
query to the same or a different file, so that a cycle of query, 
retrieval, query based on retrieval data, retrieval, etc., can be 
set up. 

Dynamic file 
"customized" files for 
organization. These 
application at hand. 
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writing out the query in a more natural form (not natural 
language, but a programming-like language) or building his query 
gradually through a computer assisted, conversational process. 


In regard 
yet been developed. 


to performance, while specifications have not 
it seems that the following are required: 


o Records must be retrievable on the 
obvious characteristics which are 
usually unique identifiers: address, 

sequence number within a file, value of 
a key or sort field. 


o Records must also be retrievable on the 
basis of Boolean combinations of these 
or other record attributes, each 
attribute (probably) being able to be 
stated as one or more relationship 
statements, as SALARY = 10000 or AGE 

<4 0. 


o 


o 


Individual items, fields, arrays, sub¬ 
records, etc., can be specified as the 
information to be retrieved from a 
record--the entire record need not be 
retrieved in response to a query. Thus, 
the burden of extracting the exact 
information needed from a record is 
placed upon the retrieval system, not 
the calling program. 


Information called for may be ordered to 
be held in a buffer or temporary storage 
area for later reference. In particular, 
this requirement is imposed to make 
cyclic retrieval possible. 


o The requestor, whether a person or a 
program, may specify the recipient of 
the information, which need not be the 
requestor. In other words, an IR system 
user may call for the retrieval of 
information and its presentation to some 
other person, output device, or program. 
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It should be noted that the requirements for a system 
responsive to both human requestors and programs, with one of the 
usinq programs being a query acquisition program that 
communicates with the human user. 


A feature that will be required of ESDP will be to 
index narrative interrogation responses to permit access by the 
retrieval system on the basis of response content. There are 
several reasonably well established techniques for doing this. 
One is to use a list of "common” words (articles, the forms of 
the verb to b£* etc.), delete these from responses, truncate the 
remaining worls at five or six letters and use them as a keyword 
index. Alternatively, a dictionary of system terms can be built 
and this used to identify words in a response that ought to be in 
the index to the response. This list must be constantly modified 
to be sure it is up to date. Another automatic technique that 
might be useful is to require that a special character precede or 
follow a data element, program name, or other system label when 
used in text. In this way, any cross reference in a response can 
be readily identified. 

More generally, the logic of computer assisted 
interrogation gives us the following information about a 
narrative documentation item, before it has been elicited from 
the programmer: 

o Subject of the question--name of UOP or 
data element, particularly aspect being 
questioned. 

o Structural relationship, for cross- 
referencing purposes, with other pro¬ 
grams or files. 

These items, combined with keywords extracted from the response, 
give the potential of a very rich keyword index for use in 
querying or in automatic dissemination. The same items can be 
used to form an index in each published report. These indexes 
would, of course, be automatically modified if the basic 
documentation were modified, either through interrogation or 
program analysis. 


We anticipate that some number of standard queries will be 
previously written and invoked by the user as he needs them. 


Some of these queries may be complete as 
completion or assignment of values 
interrogation. 


;tored and some 
to parameters 


may need 
through 


25 


This type of standard query should be quite easy to 
implement. The majority of queries, however, will be 
unanticipated. These will be processed through an interpreter 
proaram designed especially to operate on queries expressed in 
the CEL. The interpretive approach is dictated since compilation 
of queries cannot be performed rapidly enough to permit, an 
efficient on-line system. 

Many information retrieval queries will be of a form in 
which a single data file is used and a single IF statement is 
sufficient to decide upon record selection. Often, the key of 
the record will be given so the desired record may be immediately 
retrieved. If the key is not given, the implication is that each 
record must be examined for its compliance with the query, a 
process considerably shortened by the use of inverted indexes, if 
they exist. Checking for the existence of these indexes, and 
making use of them, is a function of the interpreter. 

In a typical query, the program will have been written 
in skeleton form, and the remaining data is acquired at the time 
of invocation. The items acquired are: 

o Record selection criteria—a single IF 
statement, although containing any num¬ 
ber of clauses. 

o The "THEN" functions--what to do with a 
selected record, e.g., RETRIEVE items A, 

B, C, retrieve A to 3(1,1) ("retrieve 
item A and place it in record I of 
Buffer File 1.") 

o The "ELSE" functions—iteration logic 
will be built into the original, but the 
user can add functions. He may, for 
example, choose to retrieve on the basis 
of a false IF condition. 


o 


Processing o 

f ret 

rieved 

output that 

has 

been stored 

in a 

buffer. 

eg. , SORT 

(Bl) 

on B (1, 1) , 

i. e. , 

sort 

file Bl on 

the 

first field 

of a 

record. 




o 


Additional commands, such as DEFER, 
SAVE, etc. 
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VI 


PUBLICATIONS 


There are several classes of publications, with fairly 
important economic differences among them. There will be a class 
of output for design notes and change notices. This class will 
be characterized by large volume and high frequency of issue, 
especially early in a system development cycle. These documents 
must be disseminated fully and rapidly. There is no great need 
for many of the niceties of publication that are useful in other 
forms of documentation. They can be printed at the consoles used 
by the recipients, or they can be batch printed on a high-speed, 
centralized printer and disseminated through the organization's 
regular internal mail system. 

As design and production progress, programmers, 
designers anl managers will want fairly complete documents on 
their own and closely related programs and files. These will be 
used for ready reference, and possibly for making notes to he 
used later, during interrogations. This class of documentation 
is characterized by larger documents of lower frequency of issue, 
but probably benefit.ting from more careful physical layout ind 
printinq. They will, of course, change often, but many times the 
holder of such reports can attach a change notice directly to 
this report copy, or make a hand-written note thereupon. He need 
not reproduce the entire report every time there is a change to 
it. 


A third class of documentation is the formal 
documentation normally produced at the end of a project, or for 
major progress or milestone reports. These are printed much less 
often than the others, but require many printinq features not 
always available on computer-generated documents. 

It. appears, at this point, that the logical 
capabilities represented by existinq proqrams, such as FLOWCHART 
[1] and Administrative Terminal System [2j, will handle most of 
these documentation problems. 

ATS offers all needed features except ability to hanile 
graphics. It. offers a much wider choice of type fonts when 
printinq at a terminal with changeable type elements, and the 
ability to underline text. Variation in type fonts for 
programming documentation is useful to help distinguish, for 
example, between labels or data names and normal English usage, 
as SPEED is a data item, but speed is a rate of motion. 

AUTOCHART [4] enables the entry of flow charts and 
tables. It is designed to accept, manually prepared input, hence 
should be able to interface smoothly with the interrogation 
processor. The designer does his own flow chart. layout. The 
compensation for the extra work of doing this is a compact chart 
organized in the most meaningful way, to the author. Tables and 
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charts can be modified withont complete regeneration, 
updating interrogation, as in CAINT. 


using an 
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VII 


FILE PROCESSING 


1. Requirements. From an operational viewpoint, ESDP imposes 
the following storage/retrieval requirements: 

a. Data stored on or transferred to bulk storage must 
be directly accessible to satisfy a broad range of user query 
requirements and data storage requirements from on-line consoles. 


b. Large data base processing capabilities must be 
provided in order not to restrict the size of user programming 
systems. 

c. Evolutionary file growth must be accommodated since 
at the outset of the programming development cycle the ESDP files 
are empty and evolve as the user's programming system develops. 


d. Highly variable record lengths must be allowed 
since these are dictated by the varying characteristics of the 
programs comprising the user's programming system. 


e. The processing cannot rely on predetermined 
knowledge of the distribution of search keys, used in accessing 
data, since these are dictated by the symbol coding conventions 
adopted for the user's programming system and by his natural 
language responses to interrogation. 


f. Certain files are directly related to others. For 
example, keywords are related to the UOP in which they wer° used. 
Therefore, access to one may necessitate access to the other. 


The ESDP file processor addresses these requirements and attemnts 
to provide a solution that effectively handles each requirement 
within the total context. While this may not be the optimum 
solution for any given requirement, when considered by itself, it 
does cope with the totality of requirements in an effective 
fashion. 

A total file management or information processing 
system was not considered to be an appropriate development based 
on ESDP requirements. The preferred approach was to develop a 
set of generalized modules to perform discrete functions which 
would be usable throughout the ESDP system. 

Experimental versions of the file processing routines 
have been written in the PL/I language. The physical file 
processing uses the Basic Direct Access Method (BDA1) [ *> ] through 
PL/I. All data sets are physically organized by regions, where a 
region is defined as a' unit of storage, equivalent to a disk 
track. This equivalence is based on the current. PL/I 
implementation and may vary as other storage devices are 
supported in subsequent implementations. 
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2. Rati on ale for ES DP Approach. The following discussion is 
limited to accessing technigues for files where data or records 
must be directly accessed. It excludes technigues which rely on 
sequential data organization and on a total file scan. While the 
latter have application in certain classes of retrieval problems, 
this is not the case in the ESDP system, since we are dealing 
with a large data base and a non-batched guery/retrieval 
environraent. 

The accessing problem is one of uniquely locating each 
unit of data within a file. Two general techniques can be used 
to perform this location function; namely, table look-up and 
randomization techniques. 
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For the ESDP system, randomization techniques were 
rejected as the bases of the file accessing mode. First, as 
noted earlier, the names or keys used in ESDP file accessing are 
dictated by the symbol coding conventions adopted for a user’s 
programming system and his natural language responses to 
interrogation. They cannot be predetermined. No known 
randomization technique exists which can produce satisfactory 
results, given any key set. 

Second, randomizing techniques are useful only ^or a 
sinqle access path to file data (i.e., access through a single 
key set). Because of the nature of the data in the ESDP files, 
multiple access paths must be available to the same data. Thus, 
table look-up techniques would be required to handle the 
secondary key sets and access paths. 

Ranlomizing technigues are more effective in loosely 
packed file situations. Efficiency drops sharply as denser 
packing is used. The resultant increase in storage requirements 
cannot be offset by comparable table look-up storage 
requirements. Thus, this technique would unduly tax storage 
requirements. File maintenance also becomes a problem if 
extensive chains develop. 
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The alternative to randomization is some form of table 
look-up which is the method employed in the ESDP file processor. 
Table look-up techniques employing indices have been used in many 
other systems and are the basis of the index sequential access 
method in System/360. Essentially, a table entry is created for 
each name or key used in accessing a file, and an address of the 
appropriate file location is stored with the key. When the table 
is searched, the required storage key can be obtained directly. 
Various searching algorithms can be used depending on the 
ordering of the keys in the table. 

The most efficient searching techniques require an 
ordered (typically alphabetic sort order) table based on key 
characteristics. A major problem arises with these techniques 
when applied to evolving tables or indices. Either strict order 
is maintained by physically rearranqing the index when new 
entries are inserted or chaining techniques are used. With the 
latter technique, new entries are not inserted in sequence but 
stored separately and a reference inserted at the required point 
in the sequence. To avoid extensive chain processing, file 
maintenance of the indices is periodically required, with the 
frequency of the period dictated by the index qrowth pattern. 

To avoid this maintenance and reorganization problem, 
the ESDP file processor uses a different technique for index 
buildinq and searching, which is a take-off on existing list 
processing ideas. 

Tn the ESDP file system, the index is treated as a 
group of entries which are physically strung together into a 
list, not necessarily contiguously, and which are logically 
ordered or sequenced by the use of pointers or address indicators 
which are appended to each entry. Because of this uncoupling of 
the physical and logical ordering of the index (or any list), we 
can eliminate the index reorganization problem, and with some 
other simple techniques, the index maintenance problem. 


A binary tree structure was selected to permit 
efficient search strategies, based on binary search techniques. 
The form of the index entry (or structure node) adopted for the 
ESDP case is shown in Figure S. Here: 


a. The Index Key Field contains the key or name used 
to access file data. This field contains such elements as the 
names of the (1) Units of Programming (IJOP); (2) data variable 
names; or (3) descriptor terms (i.e. , keywords). 

b. The Low Sequence Pointer contains the address of 
another index entry whose key is lower in sort sequence than the 
key of the record being examined. Similarly, the High Sequence 
Pointer contains the address of a record whose key is higher in 
sort sequence than the key' of the record being examined. 

c. The Data Field contains any additional data that is 
desired to be stored in the index. For ESDP, this field could be 
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Index Key Field 


Low Sequence Pointer 


High Sequence Pointer 


Data Field 


Figure 5. Index Entry 
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used for citation lists, disk addresses and allocation and 
addressing controls. 

This genera 1 form was defined to permit the 
implementation of a single program to perform index building and 
searching of a variety of indices, each of which had a different 
snecific organization. As typical in list processing, an initial 
pointer or 'anchor' is maintained that points to the first inlex 
entry or head of the list. 

3. Prototype ES DP Index Implementation. Typically, list 
processing techniques have been applied to lists which can be 
maintained in core memory. For the ESDP problem, the file sizes 
and index requirements are too large to justify core resident 
indices; thus, some different techniques had to be employed which 
could operate with a disk resident index. First, indices were 
segmented an 1 these segments were the units for storinq and 
retrieving from disk. The selected segment size was set at the 
track size of the disk unit used. The following PL/I structure 
declaration defines the segment format used: 

SEGMENT FORMAT 
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ID is a one 
index. 


byte 


code identifying 


all segments in 


the same 


ANC is a pointer that indicates which entry in the segment 
can be used for the next index term to be stored. If the 
value of ANC is zero then the seqment has the maximum num¬ 
ber of entries or is full. 


Each index segment must be initialized before operation as 
follows: 


a. Each index field (KEE) is set to binary zero. 

b. Each high sequence pointer is set to binary zero. 

c. Each low sequence pointer is set with the subscript 
value of the next entry in the array (MINSTRUCT). LOW 
(N) is set to binary zero. 
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d. ANC is given the value (J71) (i.e. , initially the 

first segment entry is to be used for the first index 
entry in the segment). 


A generalized index search routine has been written 
which has multiple entry points depending on the number of 
discrete indices. (At present, eight indices are maintained; six 
corresponding to the six (JOP levels, one for data definitions and 
one for keywords.) Upon access, this routine reads the first 
segment of the specified index, which contains the anchor or 
start of the index list. It begins comparing the passed search 
key against the KEE's (stored keys) in the segment. If this is 
the first entry in the index, then KEE (1) in segment one equals 
binary zero and no match is found. The no match occurs whenever 
the contents of KEE differ from the passed search key. If the 
passed key and the entry in KEE match, then the routine returns 
to the calling program and passes back the subscript value of the 
matching entry and disk address of the index segment, currently 
in core memory. 
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b. If it is less than a threshold value, then the 
value is a subscript to an entry within the index segment, 
currently in core memory. The threshold is current set at 25S 
which is the maximum number of index entries permitted in a given 
segment. The search program uses the subscript to pick up the 
next entry and repeat the comparison operation. 

c. If it is greater than the threshold, then the value 
has a double meaning; namely, it contains the disk address of the 
index seqment in which the next entry can be found and the 
subscript value of that entry within the segment. The search 
program uses the disk to ' overwrite the current core resident 
segment with the new segment. The subscript value is then used 
to pick up the desired entry and repeat the comparison loop. 
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Thus, two returns from the search routine are possible, 
either a match or a no match. In the match case, the calling 
program performs whatever processing is required using the index 
entry and rewrites the index segment to disk upon completion, if 
the specified index entry has been modified. In the no-match 
case, either an error condition exists or the calling program 
wants to add a new index entry. In the former case, some 
appropriate error processing should be performed. In the latter 
case, i.e., index entry load, the calling orogram is responsible 
for finding an empty slot that can be used for the new index 
entry. To do this, the AMC Field of the index segment is used 
since it points to the next available slot in the segment. 

If the ANC value for the index segment, currently in 
core memory, is non-zero, then the new entry can be inserted in 
the current segment and the ANC value is the subscript to this 
available space. Before using the indicated entry space, the 
calling program must replace the current ANC value by the value 
of the LOW pointer in the indicated entry space. Thus, for 
subsequent users, ANC will have an appropriate subscript value 
and continue to point to the next available entry. The new entry 
is initialized as required by the calling program and both LOW 
and HIGH pointers are set to zero, making the new entry a 
terminal node. The returned back pointer value is used to make 
the necessary linkage with the last compared key to preserve the 
logical ordering of the index. 

When the value of ANC in the current core resident 
segment equals zero, then the current segment is full and cannot 
hold a new index entry. Since the LOW pointer of the last entry 
in each segment is initialized to zero, when this entry is used, 
ANC will pick up a zero value. In this case, empty space must be 
found in some other segment of the index. Segments of the inlex 
are retrievel sequentially until a segment is found whose \nc 
value is non-zero. Note, the starting point for segment- 
retrieved is specified by a system parameter, like the anchor 
pointer, which gives the disk address of the first segment of the 
index which has empty space. As an index is initially built, 
this address will click up sequentially; however, whenever an 
index entry is deleted, thus, creating a hole in the index, this 
address will be reset to the segment from which the deletion was 
made. Thus, empty space will be reused. 


35 


VIII 


DEBUGGING SUPPORT 


1. In tro duction. He have described in Volume 3 of this report a 
language designed to facilitate programming of conversational 
processes. This language, called the CAINT Executive Language 
(CEL) is employed in writing executive programs to control the 
logic of guestion selection and wording, response analysis, and 
various other activities. CEL-written programs require 
debugging, and ESDP is designed to include a support package 
specifically tailored to assist the executive programmer in 
debugging. 

2. Debuqcjinjj. Support Capabi lit ies . He define five basic 
debugging capabilities: (1) halt, (2) display, (3) alter, (4) 
change, and (5) begin. The capabilities will be presented by 
commands of the same name, and it will be possible to embed the 
commands in CEL. 

For example, 

IF (Condition ) THEN DISPLAY (Variable) ELSE HALT. 

3. Commands. 

a. HALT 

Halt the CEL program and continue with the next 
sequential instruction when the start button on the user's 
console is pressed. 

b. DISPLAY 

Display the contents of name! variables (in core or 
external storage). 

For example, 

A = 1; B = 6; C = A + B; 

DISPLAY (C) ; 

results in a printout of: 

C = 7 

c. ALTER 

Enables the programmer to modify some area of core or 
external storage. 

For example. 
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Programmer types: 

C = A - B (continuing the above example) 

C is set equal to -5. 

<i. CHANGE 

Enables the programmer to change statements in the CEL 
program being debugged. 

There are three forms of CHANGE defined: 

(1) CHANGE (DELETE (statement number) TO 
(statement number)) 

(2) CHANGE (REPLACE (statement number) TO 
(statement number) WITH source code) 

(3) CHANGE (INSERT AFTER (statement number) 
source code ) 

4. Use of Debugging Capabilities. Data value changes may be 
traced throughout the execution of a program. 

For example, 

DISPLAY (X) ; 

means display the current value of X every time X 
changes value. 

IF (Y > 5) 6 (Y <10) THEN DISPLAY (X) ; 

means display the current value of X every time X 
changes only if Y is greater than 5 but less than 10. In other 
words, rather than inserting this statement in the CEL proqram 
every time that X is changed, the programmer can state the 
condition an 1 desired action once, and the command will be in 
effect throughout the program execution. 

Another option is deferral of printout. 

For example, 

DISPLAYD (X) ; 

means record all value changes of X and print off-line. 

A particular CAINT application might be a deferred display of all 
the question (in sequence) output for a given run. 

During a debug run some means of data base protection 
would have to be provided. This might take the form of putting 
the data base in a read-only mode for each debug application. 
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This would mean that any area of the data base could be read, but 
writing woull be directed to a scratch file, and any attempts to 
read "changed” data base records would also be directed to the 
scratch file (onto which they had previously been written). 
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correction. An ALTER command could then be used to reinitialize 
data variables to reasonable values for the point at which the 
program is restarted (using the BEGIN command). 

S. Net hods of Implementation. There are two general methods to 
support the CEL with the debugging capabilities described above: 
compilation and interpretation. 

Some pre-processing would be required to prepare a deck 
for compilation so as to support the debugging capabilities 
described above. This could be coupled with an interrogation 
designed to elicit from the programmer the specific debugging 
requirements for each program. 

Consider a simple example of the kind of preprocessing 
under discussion. 


The programmer writes the following code in which the 
numbers on the left are machine generated statement numbers, to 
be used by the programmer as operands of a CHANGE command. 


S00100 
S00 200 
S00 400 
S00S00 
S00700 
S00 800 
S00 900 
S01000 


LI: DO; 

IF UOP.NUMENS = & THEN CALL OUT ('NO MEMBERS', 
UOPAD) ; ELSE CALL OUT (ME M LIST , NU ME NS) ; 

IF UOPAD = U OP END THEN GO TO L2; 

UOPAD = UOPAD + 1; 

CALL NEXT (UOPAD) ; 

GO TO LI; 

L2: END 


The following is an example of a dialoque that could 
then take place: 

HSG Type the number of each command to he used 

1. HALT 

2. DISPLAY 

3. ALTER 

RES 2 


MSG Which variables are to be displayed? 
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RES 

HOP.NUMENS 





MSG 

Which variables are to control 

the display 

of 

UOP.NUMENS? 

RES 

UOP.NUMENS 





MSG 

For which new values of 
displayed? 

UOP. 

NUMENS is 

UOP. 

NUMENS to be 

RES 

UOP.NUMENS > H 





MSG 

Pre-processing has begun 





MSG 

Compilation has begun 





MSG 

At which statement do you 
Number 1 - 10) . 

want 

execution 

to 

begin? (Type 

RES 

1 





MSG 

Execution has begun 






Etc. 


The deck produced by the pre-processor looks as 

follows: 

LlL ?: PROC OPTIONS (MAIN) ; 


^INCLUDE DATA (DCLl) ; Includes declarations necessary to define 

data base references in program. 

^INCLUDE TEMP (CODE); Includes code to initialize variables in 

support of DISPLAY. (This code was 
generated as a result of the dialogue 
pictured above.) 

^INCLUDE PROS (CHECK); Includes this system (ESDP) program as 

internal orocodure. This program 

supports the debugging capabilities 
described above. 


LABI (1) :LI:DO; 
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LABI (2) : IF UOP.NUMENS = fl 


THEN 


LABI (3): CALL OUT (10 MEMBERS, 
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LABI (4) : 


ELSE CALL OOT (MENLIST, 


NUMENS) ; 

CALL CHECK: 

LAB1(S): IF UOPAD = UOPEND 

THEN 

LABI (6) : GO TO L2; 

LABI (7) : UOPAD = UOPAD + 1; 

CALL CHECK: 

LAB (8) : CALL NEXT (UOPAD); 

CALL CHECK; 


LABI (4) : GO TO LI; 
LAB 1(10): L2: END; 


Tn summary: 

The HALT command could have been enabled by the same 
interrqation process which enabled the DISPLAY command. The 
program can still be halted by the programmer by pressing the 
stop button on the console. 

The CHANGE command can be utilized following a halt by 
means of a dialogue with the CHECK routine. 

The DISPLAY command has been selectively enabled 
through interroaation. 

The ALTER command could have been enabled by means of 
interrogation. 

The BEGIN command is supported by embedding the program 
in a label array as shown. 

The main differences between compilation and 
interpretation are (1) interpretation will effect some 
implementation costs whereas the compiler is essentially free, 

(2) interpretive program CHANGES can be made more rapidly, 
because it is not necessary to recompile and linkage edit, and 

(3) interpretive debugging commands can be entered at execution 
time. 
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gation and is a process of eliciting information directly from the programmer, through on-line 
communication terminals. I he tormer provides canonical data about the program's structure. The 
latter provides explanatory material about all aspects of the program, and in the absence of 
canonical data, may provide tentative structural information as well. The conclusion of the study 
group is that ESDP is a feasible concept with present-day technology and that it will materially 
benefit using organizations in the production of programs and in guiding their evolution as 
requirements change. Its value will be greater for larger organizations, whose internal communica¬ 
tions difficulties tend to cause truly gigantic inefficiencies. Its implementation as a support 
system for such projects would require a significant quantum of investment in order to produce 
these benefits and is predicated on the use of a computer system dedicated solely to the use of 
ESDP. 
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